# Development of Portable High Performance Computing System by Parallel FDTD Dedicated Computers

Yuya Fujita, Hideki Kawaguchi

Graduate School of Engineering, Muroran Institute of Technology 27-1, Mizumoto-cho, Muroran 050-8585, Japan s1524082@mmm.muroran-it.ac.jp, kawa@mmm.muroran-it.ac.jp

Abstract — As one of possibility of portable high performance computing(HPC) for industry applications, authors have been working in development of the FDTD dedicated computer for microwave simulations. It was shown that the dedicated computer by using highly optimized memory access architecture can achieve very efficient execution of the FDTD scheme. In this paper, to aim to higher performance computation, development of a parallel computing system of the FDTD dedicated computers is presented.

# I. INTRODUCTION

In recent years, microwave simulation has been widely introduced and effectively used in industry applications such as design of high frequency printed circuit board. Then the numerical models gradually shifted to complicated and large system, and high performance computation(HPC) is strongly required. In general, supercomputer or PC cluster is employed for such the HPC. However these HPCs are big size hardware and require large amount of power consumption. In addition, these computer systems are not for single user. Accordingly a different type of HPC system which is convenient for product design is required in industry applications.

As one of possibility of such portable HPC with small size hardware and low power consumption, authors have developed the FDTD dedicated computer for the microwave simulation. Then it was shown that the dedicated computer by using highly optimized memory access architecture can achieve very efficient execution of the FDTD scheme<sup>[1]-[3]</sup>. In this paper, development of a parallel computing system of FDTD dedicated computers for higher performance calculation is presented.

## II. ARCHITECTURE OF FDTD DEDICATED COMPUTER

The FDTD dedicated computer that we have developed so far is summerized in this section<sup>[1]-[3]</sup>. Figure 1 shows the architecture of the FDTD dedicated computer. The dedicated computer is composed of two parts, a calculation module and a memory module. The memory module is composed of 14 RAM. Due to the 14 RAM structure, all necessary data for FDTD calculation of three field

IDLE

CLK

DATA

SDRAM STATE

Machine STATE

components on one grid can be loaded to the calculation module by single memory access. On the other hand, the calculation module is composed of pipeline processing circuit and register array for buffering calculation results. Figure 2 shows memory access and calculation timing of the dedicated computer. The calculation time ("CALCULATION") of FDTD scheme is hidden in memory access time ("128 BURST READ/WRITE") by using pipeline processing and buffering in register array, which means highly optimized FDTD calculation system including the memory module is achieved.

## III. ARICHITECTURE OF PARALLEL COMPUTING SYSTEM BY DEDICATED COMPUTERS

In parallel FDTD calculation by domain decomposition method, data communication between each domain is carried out after FDTD calculation for all grids. Therefore the data communication overhead will be serious for the case of large number of sub domains. To improve the communication overhead, the FDTD calculation and data communication should be simultaneously executed in the parallel FDTD dedicated computers. The figures 3 and 4 show configuration of domain decomposition and timing chart in dedicated computers respectively in the case of two sub domains. The burst read operation from the memory module to calculation module in two computers is carried out in the order of 1-2-3-4 and 5-6-7-8 respectively. On the other hand, the FDTD calculation on each computer is executed in the order of 2-3-4-1 and 6-7-8-5, and the burst



Figure 2 Calculation timing and memory access timing

write to the RAM as well. To adopt this ordering, the FDTD calculation of the edge grids such as 1 and 5 can be executed without any latency, and highly parallelized computing system is constructed.

### IV. HARDWARE OPERATION

Parallel computing system by six FDTD dedicated computers is shown in figure 5. One dedicated computer is composed of one FPGA and fourteen SDRAM. This can calculate 256x256x128 grids and its clock frequency is 51 MHz. A numerical example assumed to be simple rectangular wave-guide, which is discreatized by 1536 x 256 x 128 grids. TE10 mode with 1.0 GHz is excited in the middle part of the wave-guide and 4 layers PML are allocated at both edges of the wave-guide. Figure 6 shows distribution of z component of the electric field at 2000 time step. To compare with C language simulation, we confirm that the dedicated computer operates correctly. Comparison of performance of several computer systems is shown in table 1. GPU shows the fastest computing speed. But the power consumption is the worst in these computer systems. On the other hand, power consumption of dedicated computer is the lowest in these computers and the speed per power consumption is three times higher than the GPU.. In addition the performance of the parallel dedicated computer

is completely proportional to the number of dedicated computers.

## V. CONCLUSIOIN

In this paper, parallel computing system by six dedicated computers has been presented as one of portable HPC technologies. It is shown that the calculation speed is completely proportional to the number of connected dedicated computers due to no communication overhead and, the calculation speed per power consumption of the dedicated computer is higher than those of CPU and GPU. This means that very big parallel system of the dedicated computers can be constructed in very low power.

#### VI. REFERENCES

- H.Kawaguchi, Y.Fujita, Y.Fujishima, S.Matsuoka, Improved Architecture of FDTD/FIT Dedicated Computer for Higher Performance Computation, *IEEE Transactions on Magnetics*, VOL. 44, NO. 6, p1226-229, JUNE 2008
- [2] Y. Fujita and H. Kawaguchi, Full Custom PCB Implementation of FDTD/FIT Dedicated Computer, *IEEE Transactions on Magnetics* 45(3) (March 2009), pp1100–1103
- [3] Y. Fujita and H. Kawaguchi, Development of improved memory architecture FDTD/FIT dedicated computer based on SDRAM for large scale microwave simulation, *International Journal of Applied Electromagnetics and Mechanics* 32 (2010) pp145–157



Figure 3 Configuration of electromagnetic field on two domains

| Domain 1 <sup>data1</sup><br>calculation1 | Grid1 Grid2 Grid3 Grid2 | Grid4) (Grid2)<br>Grid3, (Grid4) | Grid3 Grid4 Grid1          |
|-------------------------------------------|-------------------------|----------------------------------|----------------------------|
| Domain 2 data2<br>calculation2            |                         | Grid8 Grid6<br>Grid7 Grid8       | Grid7 Grid8 Grid5<br>Grid5 |
| communication                             |                         | commun                           | ication〉                   |
| SDRAM                                     | Read                    |                                  | Write                      |
| Calculation                               |                         | Calculatio                       | n )                        |

Figure 4 Operation states of communication and calculation timing

#### Table.1 Comparing with performance of each computer

| Computer<br>system | Specification                               | Speed<br>(Mcell/s) | Power(W) | Speed /Power |
|--------------------|---------------------------------------------|--------------------|----------|--------------|
| CPU                | Core i7 (3.2GHz),<br>DDR3 Triple<br>channel | 61.7               | 206      | 0.30         |
| GPU                | Geforce GTX 285                             | 234                | 238      | 0.983        |
| Dedicated computer | Single (51MHz)                              | 12.5               | 4        | 3.13         |
|                    | Six parallel<br>(51MHz)                     | 75                 | 24       | 3.13         |

Figure 5 Parallel computing system of FDTD dedicated computers



Figure 6 Ez component of electric field at 2000 time step